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In earlier papers, we have presented motion- compensated television 
coding schemes in which the displacement of objects was recursively 
estimated using a steepest descent algorithm that minimized the 
square of the intensity prediction error at each picture element. In 
this paper, we present extensions in which displacement is estimated 
by considering the prediction error at several picture elements. These 
extensions are more complex, but they significantly improve the 
performance of the displacement estimation in those cases where the 
displacement is spatially uniform. However, in real scenes containing 
large spatial variations of displacement, only a small improvement 
is obtained. For one scene containing a head-and-shoulders view of 
a person engaged in active conversation, an improvement of about 10 
percent in average bit rate was obtained over our previous motion- 
compensation scheme. 

I. INTRODUCTION 

The displacement estimation algorithms described in this paper 
estimate the displacement of objects in successive frames of a television 
scene. They are a generalization of the pel-recursive displacement 
estimation algorithm that we had introduced earlier. 1,2 Before describ- 
ing the generalization, it is useful to first outline the pel-recursive 
displacement estimation algorithm. Let /(x*, t) denote the intensity of 
a scene at the Ath sample point x* from a scan line and let I(xk, t — 
t) denote the intensity at the same spatial location in the previous 
frame.* If the scene consists of an object that is undergoing pure 
translation under uniform illumination, then, disregarding the back- 
ground, 

7(x*, t) = 7(x* - D, t - t), (1) 



* Subscript k in x* is used to denote the sample number in the same order as scanning. 
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where D is the displacement (two-component vector) of the object in 
one frame interval t. The pel-recursive algorithm obtains an estimate 
of D (i.e., t)) by recursively minimizing the square of the displaced 
frame difference at the current pel location. The displaced frame 
difference dfd(- , •) is defined by 

dfd(x*, t» = I(x k , t) -I(*k-f),t- t). (2) 

The minimization is done by a steepest descent algorithm of the form 

ft + l - ft - % € V,J,[DFD(X*, ft)] 2 , (3) 

where Vd,.[*] is the two-dimensional gradient with respect to ft. 
Equation (3) can be expanded to 

ft+l = ft - €DFD(X*, ft) V I(x k - ft, * - t), (4) 

where V = V x is the two-dimensional spatial gradient operator with 
respect to horizontal and vertical coordinates of vector x.* Having 
computed displacement, the motion-compensated coder predicts inten- 
sity, 7(xa, t), by the displaced previous frame intensity 7(x* — ft, £ — 
t) using interpolation for nonintegral (in terms of pel distances) values 
of ft. The displacement at either the previous pel or the previous line 
element is used to predict the intensity of the present pel. This allows 
the receiver to compute displaced previous frame intensity (or the 
prediction) without explicit transmission of the displacement. If the 
magnitude of the prediction error exceeds a predetermined threshold, 
the coder transmits a quantized version of dfd(x*, ft) and the neces- 
sary addressing information to the receiver. 

The extensions of this paper consider the displaced frame differences 
at many picture elements to estimate D. For example, t) can be 
updated from sample to sample by using a steepest-descent algorithm 
to minimize a weighted sum of the squared displaced frame differences 
at some previously transmitted neighboring picture elements. Thus, 



ft+i = ft - e • % • Vo, 



p 



I WApFD(Xk-j, ft)] 2 



(5) 



where Wj > and £?_ Wj = 1. 

We consider two variations of this algorithm. In one, the displace- 
ment is estimated by steepest descent on the weighted sum of the 
squared displaced frame differences. This corresponds to eq. (5) above. 
In the other algorithm, displacement is estimated by a least-mean- 
square approximation using a specified number of neighboring picture 



* Subscript i is used to denote the iteration number. Since the displacement estimate 
may not be revised at each picture element, in general, i may not be the same as k at a 
given picture element. Also in some cases, there could be many iterations for the same 
ft(orxt). 
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elements and a previous estimate of displacement. As in our previous 
pel-recursive estimator, these estimators can also be generalized to the 
transform domain. 3,4 

This paper is organized as follows. The next section gives a detailed 
description of the two algorithms. Section III gives results of simula- 
tions on synthetic computer-generated scenes as well as a real scene 
containing complex motion. It is seen that both the estimators signifi- 
cantly improve the performance of the displacement estimator for the 
synthetic scene. In the case of the real scenes, however, both the 
estimators give only about 10-percent improvement in the bit rate at 
a significant increase in the complexity. 

II. DESCRIPTION OF THE ALGORITHMS 

In this section, we develop the two algorithms for estimating dis- 
placement. Algorithm 1, called gradient of summed error (gse), works 
as follows. Let x* be the current picture element and D, be the estimate 
of displacement at the ith iteration. This estimate is revised by using 
the steepest descent on the summed error given by 

l WAX**-/, t) - K-Kk-j - Of, t - t)] 2 (6) 

and, therefore, the(i + l)th estimate of D, i.e., A+i, is given by 

r3 1+1 = tf, - € | £ w;dfd(x a _„ d.) . v[/(x*_ y - r3„ t - t)]1 . (7) 

This can be generalized further using a different function of the 
errors (i.e., dfd(- , •)) instead of the square function. The difference 
between eqs. (7) and (4) is that, to estimate a new value JX+i of D from 
the old value £>,, the displaced frame difference is evaluated at several 
neighboring picture elements rather than just one picture element. 
This has the effect of smoothing the update term [that is, the second 
term on the right-hand side of eq. (4) ]. As in our earlier papers, the 
displacement is revised only at those picture elements where the 
magnitude of the frame difference, 7(x*, t) — 7(x*, t — r), is higher 
than a threshold. Thus, the displacement is revised only in the "moving 
areas." 

The second displacement algorithm is a least-mean-square esti- 
mator based on the intensity in the previous frame at a location 
displaced by the old estimate of displacement. Thus, assuming eq. (1), 
the displaced frame difference of eq. (2) can be written as 

DFD (x*. D.) = J(x* - D, t - t) - J(x* - Tj,, t - t) 

= -(D - D,) r V7(x* - T3„ t - t) + other terms, (8) 

where the superscript T on a vector or a matrix denotes its transpose. 
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Quantity (D — D,) can be estimated by standard techniques of linear 
least-mean square by using dfd(-,-) on the left-hand side as an 
observation and treating the higher order terms on the right-hand side 
as noise.* This gives us an algorithm of the form 

-l-i 



t) i+i = Di - 



£ VJ(x*_, - Di, t - t)V/ t (x*_, - t) it t - t) 
y'-o 

£ dfd(x*-„ t)i)VI(x k -j - t)i, t - r) 

.J-° 



(9) 



The matrix inverse in the above equation can be approximated by 



A, 



X LDIFj 

j-0 



-£ LDIFjEDIFj 

y-o 



-S LDIFjEDIFj X EDIFj 

y-o j-o 



with 



where 



A, = 2 EDIFj £ LDIF] - I £ EDIFj LDIFj 



y-o 



y'-o 



j-0 



(10) 



EDIFj = Element difference at (x*_, — ft,) in the previous frame 

(approximating the horizontal component of VI). 
LDIFj = Line difference at (x*- y — D,-) in the previous frame 
(approximating the vertical component of VI). 
In evaluating the matrix inverse, one must be careful that the matrix 
is not singular. Singularity can be a result of not averaging enough 
samples. If, on the other hand, a large number of samples are averaged, 
then displacement averaging may result. To avoid this, whenever M, 
came close to being singular (as indicated by its determinant), no 
update was performed. The second term of eq. (9) is given by 

l dfd(x*_, - fi„ t - t) EDIFj 



b = 



y=o 



X dfd(x*_, - D„ t - t) LDIFj 

y-o 



(ID 



Since Mi is only a 2 X 2 matrix, no great savings may result in 
computations by using the matrix inversion lemma. 5,6 



* Linear least-mean-square estimation can be used by assuming that the second term 
of the right-hand side of eq. (9), i.e., V/(x* — D,, / — t), does not "strongly" depend upon 
D,. 
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The € in both algorithms was kept fixed. For algorithm 1, the best 
e was found to be 1/128 among the set of e that were tried. Similarly 
for algorithm 2, the best e was found to be Vfe. The update term for 
both the algorithms was also limited to make sure that no individual 
iteration changed the displacement estimate by more than a certain 
amount. In algorithm 1, the update term was limited to (0.2) pels/field, 
whereas for algorithm 2, it was limited to (0.08) pels/field. 



III. SIMULATION RESULTS 

We simulated the above two algorithms on two types of scenes. The 
first scene is a synthetic scene which was computer-generated. It is a 
damped radial cosine in intensity with a radius of 60 pels which 
translated from frame to frame by a given amount. The pattern is 
described mathematically by the intensity function 

I(R) = 100 • Exp(-0.01#)cos(27rR/P) + 128; < R < 60, (12a) 

where R is the radial distance from the center (taken to be (100, 100)) 
and 

P = (1 - i?/60)10 + 10. (12b) 

This function is displayed on a 256 X 256 element raster in two 
interlaced fields of 128 lines each. This pattern is shown in Fig. 1. The 




Fig. 1 — Synthetic image used in simulations. This image is described by eq. (12) in 
Section III. 
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Fig. 2— Single frame from the scene Judy. 

other scene, called Judy, is a head-and-shoulders view of a person 
engaged in active conversation. This consisted of 60 frames obtained 
by Nyquist rate sampling of a video signal having 1-MHz bandwidth. 
Each sample was quantized uniformly to 8 bits. One frame of this 
scene is shown as Fig. 2. 

Various configurations of neighboring picture elements were used to 
evaluate the error terms. Referring to Fig. 3, let the prediction be 
evaluated at the picture element X. The error terms are evaluated at 
picture elements with the following five configurations. 

Configuration 1 — Error term consists of error only at element K. 
This is similar to our previous algorithm, 1,2 and is included here for the 
purpose of comparison. 

Configuration 2 — The error term is made up of the displaced frame 
differences at {C, D, J, K, L} . 

Configuration 3 — The error term is made up of displaced frame 
differences at {A, B, X, C, K} . This uses certain picture elements not 
yet available to the receiver and is included only to evaluate the effect 
of knowledge of such picture elements on the displacement estimator. 

Configuration 4 — The error term is made of displaced frame differ- 
ences at {I, J, K, L, M, O, P, Q} . We note that this configuration uses 
picture elements only from the previous lines in the same field. 
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Configuration 5 — Here the error term is made up of displaced frame 
differences at {C, D, E, F, H, I, J, K, L, M, N, 0, P, Q}. This uses 
previously transmitted picture elements from the present line as well 
as from the last two lines. 

Many other configurations were tried; however, no interesting con- 
clusions could be reached for these others. In some of these cases, 
weighted errors with unequal weights were used. 

3. 1 Results from synthetic scenes 

In this case, only the quality of the displacement estimators was 
judged with no reference to its usefulness for coding. Displacement 
estimators were initialized to zero at the leftmost element of each scan 
line, and recursions were carried out from pel to pel within a scan line. 
Figure 4 is a plot of the normalized displacement error 

|| £>, - D || / 1| Do - D || 

for algorithm 1, and Fig. 5 shows it for algorithm 2. It is clear from 
these figures that inclusion of more picture elements in the error term 
improves the convergence and the steady-state error. For example, 
with algorithm 1, configuration 2 decreases the normalized displace- 
ment error to 0.06 compared to 0.5 for configuration 1. 

For algorithm 1, configurations that use picture elements from only 
the previous line do not perform as well as conflguratioins of picture 
elements from the present and previous line. For algorithm 2, in 
general, the convergence is much faster than algorithm 1, and low 
steady-state error is obtained. Configuration 4, for example, attains in 
40 iterations a normalized displacement error of 0.09 for algorithm 1 
and 0.05 for algorithm 2. Configuration 5 does significantly better than 
configuration 1 for algorithm 2. In 60 iterations, it reduces the nor- 
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Fig. 3— Configuration of picture elements for weighted error calculation. Pel X is 
being predicted. Dotted lines denote scan lines from previous field. 
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Fig. 4— Normalized displacement error (J t), , — D || / || t) — D || ) is plotted against the 
iteration number for algorithm 1. Four configurations (1, 2, 3, 4) are shown. All iterations 
are started from the left-most pel in a scan line, Oo is taken to be zero, and D is taken 
to be 2 pels per frame in the horizontal direction. 



malized displacement error by a factor of more than 30 compared to 
configuration 1. It is seen, therefore, that for both algorithms the 
increase of number of picture elements in the error term is a significant 
advantage and appears to be a controlling factor. We also tried to use 
weights in the calculation of the error term which were inversely 
proportional to the exponential of the distance from the picture ele- 
ment X. It was found that there was no significant improvement using 
such a set of weights. One can conclude, then, that in the synthetic 
scene where the displacement is spatially uniform, proximity to the 
picture element X is not as important as the number of picture 
elements used in averaging the error. We also made some simulations 
in which the synthetic scene was corrupted by additive noise. In these 
cases, the convergence and the steady-state error generally became 
worse compared to the case of no noise. However, the additional 
improvement by using more picture elements was higher than in the 
case of no noise. Thus, the relative improvement by using configuration 
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5 compared to configuration 1 was higher (normalized error decreased 
by a factor of 40, compared to the factor of 30 for no noise case). It was 
also found that, for algorithm 2, the number of times matrix M, came 
close to being singular was more in the case of configuration 2 than in 
configuration 5. This supplements our reasoning that averaging large 
number of pels prevented the matrix M from becoming nonsingular. 

3.2 Results for real scenes 

Both algorithms 1 and 2 were simulated with real scenes as the 
input. This was done to evaluate their usefulness for coding. Thus, a 
sequence of pictures was coded by a motion-compensated coder of the 
same type as in Ref. 2, except that the displacement was estimated by 
the new extensions. We used configurations 1, 3, and 5. Configuration 
3 was used only for algorithm 1 and configuration 5 was used for both 
algorithms. The dpcm quantizer had 35 levels and is given in Fig. 10 of 
Ref. 1. Prediction error was sent in a quantized form only if it was 
higher than a threshold of 3 (out of 255 corresponding to 8-bit signals). 
This gave a picture quality which appeared reasonable; that is, the 
coding degradations were visible but not annoying. 
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Fig. 5 — Normalized displacement error against the iteration number for algorithm 2. 
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Fig. 6— Plots of bits per field are given for the scene Judy. Configuration 1 corresponds 
to our previous motion-compensation algorithm (Ref. 2). Configuration 5 is shown for 
the two algorithms (gse and lms). Configuration 3 uses some pels for displacement 
estimation that are not yet available to the receiver. 

The total bits per frame were calculated by adding bit requirements 
for quantized prediction error (whenever it was sent) and addressing. 
Error bits were computed by evaluating the entropy of the quantized 
prediction error, and the address bits were computed by one-dimen- 
sional run length coding of the unpredictable picture elements along a 
scan line. To reduce the computational burden, only a part (240 X 240 
array) of the frame was coded. Figure 6 shows the plots of bits per 
field as a function of the field number for the 120 fields of the scene 
Judy. Configuration 1, which is our old motion-compensation algo- 
rithm, 2 is plotted for comparison. It is seen from Fig. 6 that configu- 
ration 5 for both algorithms is about 10 percent better than configu- 
ration 1. There is a slight preference for algorithm 2 over algorithm 1. 
Configuration 3, which uses some picture elements not yet available to 
the receiver, does about 20 percent better than does configuration 1. It 
is seen that, although configuration 5 gives dramatic improvement in 
convergence of displacement iterations for synthetic scenes, it does not 
appreciably improve the performance of the coder. We feel this may 
be a result of spatially nonuniform displacements in the real scene. 
The use of a larger number of picture elements for evaluation of the 
error term has the effect of averaging displacements which might not 
be so useful if these displacements are nonuniform. Another scene, 
Mike and Nadine, 1,2 was also processed with both the algorithms using 
configuration 5. It was found that bit rates decreased by about 12 
percent compared to the use of configuration 1 (our previous motion 
compensation algorithm). 
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IV. CONCLUSIONS 

We have presented in this paper extensions of our previous displace- 
ment estimation algorithms. These extensions allow us to recursively 
estimate the displacement of objects by niinimizing the intensity 
prediction error at several picture elements rather than only one, as in 
our previous algorithms. We have found that, for synthetic scenes 
where the displacement is spatially uniform, the extensions perform 
significantly better in terms of convergence and steady-state displace- 
ment error. However, in real scenes, where the displacement may be 
spatially nonuniform, only about 10-percent improvement in average 
bit rates is obtained compared to our previous motion-compensation 
schemes. 
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